Personal Loan Campaign

Problem Statement

Objective

Data

Loan_Modelling.csv - the raw data that is used for the project.

Import the necessary packages

Read the dataset

Data background and contents - Data Structure Overview

Remove ID column

Convert ZIPCode variable to county

Understand the shape of the data

Check the data types of the columns for the dataset

Check for duplicates

Check summary statistics of the dataset

Check missing values in dataset

Univariate Data Analysis

Observation on Age

Observation on Experience

Observation on Income

Observation on Family

Observation on CCAvg

Observation on Education

Observations on Mortgage

Observations on non-numerical variables

Observation on Personal_Loan

Observation on Securities_Account

Observation on CD_Account

Observation on Online

Observation on CreditCard

Observation on County

Bivariate Data Analysis

Personal_Loan with Age

Personal_Loan with Experience

Personal_Loan with Income

Personal_Loan with County

Personal_Loan with Family

Personal_Loan with CCAvg

Personal_Loan with Education

Personal_Loan with Mortgage

Personal_Loan with Securities_Account

Personal_Loan with CD_Account

Personal_Loan with Online

Personal_Loan with CreditCard

Summary of EDA

Data Description:

Univariate Data Analysis:

Bivariate Data Analysis:

Data Pre-Processing

Outlier Treatment

Fix Experience variable

Summary of Data Pre-Processing

Outlier Treatment

Fix Experience variable

Model building - Logistic Regression

Define dependent variable

Create dummy variables

Split the data into train and test

Logistic Regression Model

Finding the coefficients

Coefficient Interpretations

Converting coefficients to odds

Coefficient interpretations

Above are coefficient explanations for some columns. Interpretation for other columns can be done similarly.

Check model performances

Checking model performance on training set

Confusion matrix interpretation for training dataset

Checking model performance on testing set

Confusion matrix interpretation for testing dataset

ROC-AUC Evaluation

ROC-AUC on training dataset

ROC-AUC on testing dataset

ROC-AUC Interpretations on training and testing data

Model Performance Improvement

Optimal threshold using AUC-ROC curve

Checking model performance on training set

Checking model performance on testing set

Precision-Recall curve

Checking model performance on training set

Checking model performance on testing set

Logistic Regression Model Performance Summary

Training performance comparison

-> Hence, F1 score is used to evaluate the model's performance.

Model building - Decision Tree

Checking model performance on training set

Checking model performance on training set

Checking model performance on test set

Visualizing the Decision Tree

Pre-Pruning

Reducing over fitting - using GridSearch for hyperparameter tuning of Decision Tree model

Checking performance on training set

Checking performance on testing set

Visualizing the Decision Tree

Observeration

Interpretations from other decision rules can be made similarly

Post-Pruning

Recall vs alpha for training and testing sets

Check model performances

Checking model performance on training set

Checking model performance on test set

Visualizing the Decision Tree

Comparing all the decision tree models

Training performance comparison

Testing performance comparison

Observation

Conclusions

Business Recommendations